在混合神经网络中,昂贵的卷积层被不可训练的固定变换所取代,参数大幅减少。在以前的作品中,通过用小波代替卷积来获得良好的结果。然而,基于小波的混合网络继承了小波沿曲线及其轴偏置的消失力矩。我们建议使用剪力岩对重要图像功能(例如边缘,脊和斑点)的强大支持。最终的网络称为复杂的剪切网络(COSHNET)。它在针对Resnet-50和Resnet-18的时装摄影师上进行了测试,分别获得了92.2%和90.7%和91.8%的测试。所提出的网络具有49.9k参数,而RESNET-18的参数为11.18m,使用较少的拖鞋52倍。最后,我们在Resnet要求的200个时期与200个时期进行了培训,不需要任何高参数调整或正则化。代码:https://github.com/ujjawal-k-panchal/coshnet
translated by 谷歌翻译
功能工程已成为提高模型预测性能并生产优质数据集的最重要步骤之一。但是,此过程需要非平凡的域知识,涉及耗时的过程。因此,自动化此过程已成为研究的积极领域,并在工业应用中感兴趣。在本文中,提出了一种称为基于元学习和因果关系的特征工程(MACFE)的新方法。我们的方法基于使用元学习,特征分布编码和因果关系特征选择。在MacFe中,使用元学习来找到最佳的转换,然后通过预选为“原始”功能来加速搜索,鉴于其因果关系的相关性。对流行分类数据集的实验评估表明,MACFE可以改善八个分类器的预测性能,表现平均最低的最新方法至少提高6.54%,并且比最佳先前工作的提高了2.71%。
translated by 谷歌翻译
Biometrics is the science of identifying an individual based on their intrinsic anatomical or behavioural characteristics, such as fingerprints, face, iris, gait, and voice. Iris recognition is one of the most successful methods because it exploits the rich texture of the human iris, which is unique even for twins and does not degrade with age. Modern approaches to iris recognition utilize deep learning to segment the valid portion of the iris from the rest of the eye, so it can then be encoded, stored and compared. This paper aims to improve the accuracy of iris semantic segmentation systems by introducing a novel data augmentation technique. Our method can transform an iris image with a certain dilation level into any desired dilation level, thus augmenting the variability and number of training examples from a small dataset. The proposed method is fast and does not require training. The results indicate that our data augmentation method can improve segmentation accuracy up to 15% for images with high pupil dilation, which creates a more reliable iris recognition pipeline, even under extreme dilation.
translated by 谷歌翻译
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to specify the output image using a text prompt. Inspired by the success of those models, and led by the notion that language was already developed to describe the elements of visual contexts that humans find most important, we introduce an embedding model closely related to a vision-language model. Specifically, we introduce the embedding model S-MAGMA: a 13 billion parameter multimodal decoder combining components from an autoregressive vision-language model MAGMA and biases finetuned for semantic search.
translated by 谷歌翻译
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. Unfortunately, in a broad range of problems the design of a good reward function is not trivial, so in such cases sparse reward signals are instead adopted. The lack of a dense reward function poses new challenges, mostly related to exploration. Imitation Learning has addressed those problems by leveraging demonstrations from experts. In the absence of an expert (and its subsequent demonstrations), an option is to prioritize well-suited exploration experiences collected by the agent in order to bootstrap its learning process with good exploration behaviors. However, this solution highly depends on the ability of the agent to discover such trajectories in the early stages of its learning process. To tackle this issue, we propose to combine imitation learning with intrinsic motivation, two of the most widely adopted techniques to address problems with sparse reward. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process. This combination is shown to yield an improved performance and better generalization in procedurally-generated environments, outperforming previously reported self-imitation learning methods and achieving equal or better sample efficiency with respect to intrinsic motivation in isolation.
translated by 谷歌翻译
Machine-Learned Likelihoods (MLL) is a method that, by combining modern machine-learning classification techniques with likelihood-based inference tests, allows to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including the exclusion hypothesis tests and show that the addition of Kernel Density Estimators avoids the need to bin the classifier output in order to extract the resulting one-dimensional signal and background probability density functions. We first test our method on toy models generated with multivariate Gaussian distributions, where the true probability distribution functions are known. We then apply it to a case of interest in the search for new physics at the HL-LHC, in which a $Z^\prime$ boson decays into lepton pairs, comparing the performance of our method for estimating 95\% CL exclusion limits to the results obtained applying a binned likelihood to the machine-learning classifier output.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation problem as a regression task of the cell and gutta signed distance maps instead of a pixel-level classification task as typically done with UNets. Compared to the conventional UNet classification approach, the distance-map regression approach converges faster in clinically relevant parameters. It also produces morphometric parameters that agree with the manually-segmented ground-truth data, namely the average cell density difference of -41.9 cells/mm2 (95% confidence interval (CI) [-306.2, 222.5]) and the average difference of mean cell area of 14.8 um2 (95% CI [-41.9, 71.5]). These results suggest a promising alternative for CE assessment.
translated by 谷歌翻译
人类利用先验知识来描述图像,并能够使其解释适应特定的上下文信息,即使在上下文信息和图像不匹配时,也可以在发明合理的解释的范围内。在这项工作中,我们提出了通过整合上下文知识来字幕Wikipedia图像的新颖任务。具体而言,我们制作的模型共同推理了Wikipedia文章,Wikimedia图像及其相关描述以产生上下文化的标题。特别是,可以使用类似的Wikimedia图像来说明不同的文章,并且所产生的标题需要适应特定的上下文,因此使我们能够探索模型的限制以调整标题为不同的上下文信息。该领域中的一个特殊挑战性的任务是处理量不多的单词和命名实体。为了解决这个问题,我们提出了一个预训练目标,掩盖了命名实体建模(MNEM),并表明与基线模型相比,此借口任务可以改善。此外,我们验证了Wikipedia中使用MNEM目标预先训练的模型可以很好地推广到新闻字幕数据集。此外,我们根据字幕任务的难度定义了两种不同的测试拆分。我们提供有关每种方式的作用和重要性的见解,并突出我们模型的局限性。接受时,代码,模型和数据拆分可公开可用。
translated by 谷歌翻译
在本文中,我们介绍了一个多语言场景文本视觉问题的框架,以零拍的方式处理新语言。具体来说,我们考虑场景文本视觉质量回答(STVQA)的任务,其中可以用不同的语言提出问题,并且不一定与场景文本语言保持一致。因此,我们首先引入了自然的步骤,朝着更广泛的版本的STVQA:MUST-VQA介绍。考虑到这一点,我们讨论了在受约束设置的两个评估方案,即IID和零照片,我们证明这些模型可以在零拍设置的标准杆上执行。我们进一步提供了广泛的实验,并显示了将多语言模型调整为STVQA任务的有效性。
translated by 谷歌翻译